CKMR: A general overview
Paul B. Conn
The Wildlife Society CKMR Workshop, Sunday November 6, 2022
Paul Conn– Research statistician with the Marine Mammal Laboratory at NOAA Alaska Fisheries Science Center.
\[\\[1in]\]
Eric Anderson– Research geneticist at NOAA’s Southwest Fisheries Science Center.
Other acknowledgments: Mark Bravington (CSIRO); Brian Taras, Lori Quakenbush (ADF&G)
8:00 - 8:45 Close-kin mark-recapture: An overview (P. Conn)
8:45 - 9:30 An introduction to genetic data and inheritance (E. Anderson)
9:30 - 9:45 Break
9:45 - 10:30 Statistical inference for CKMR abundance estimation (P. Conn)
10:30 - 11:15 Kin finding (E. Anderson)
11:15 - 12:00 Designing a CKMR study
12:00 - 1:00 Lunch
1:00 - 5:00 R/TMB labs (full day participants only)1
\[\\[1in]\]
1 You should have followed “Setting up your computer” instructions in the workshop book!
Slides for morning lectures: https://github.com/eriqande/tws-ckmr-2022/tree/main/slides
“Book” for afternoon labs: https://eriqande.github.io/tws-ckmr-2022/
General workshop github repository: https://github.com/eriqande/tws-ckmr-2022
A CKMR website w/ more examples: https://closekin.github.io/
Sample occasion 1: mark \(n\) animals (blue) out of a population of \(N\) animals
Sample occasion 2: capture \(M\) animals, \(m\) of which were previously marked
\(\color{blue}{\text{Mark-recapture}}\)
\(\color{blue}{\text{CKMR}}\)
A framework for estimating adult abundance and survival using the frequency of observed kinship relationships
Parent-offspring pairs (POPs) Adult abundance and reproductive schedules (assuming age is known…)
Half-sibling pairs (HSPs) Adult abundance and survival (again assuming ages are known)
Compare each genotyped sample to all of the others. We can then maximize the pseudo-likelihood
\(\prod_i \prod_{j>i} p_{ij} y_{ij} + (1-p_{ij}) (1-y_{ij})\)
\(y_{ij}\) is a binary random variable taking on the value 1.0 if animals \(i\) and \(j\) are a match.
\(p_{ij}\) is the probability of a match
Compare each genotyped sample to all of the others. We can then maximize the pseudo-likelihood
\(\prod_i \prod_{j>i} p_{ij} y_{ij} + (1-p_{ij}) (1-y_{ij})\)
\(y_{ij}\) is a binary random variable taking on the value 1.0 if animals \(i\) and \(j\) are a match.
\(p_{ij}\) is the probability of a match
But how do we figure out what the \(p_{ij}\) probabilities are? And how are these related to what we care about (abundance and survival)?
-Depends on what type of relationship is being considered, sex of parent, etc.
-Calculations rely on ERRO
Lexis diagrams are helpful!
\[\begin{equation*} p_{ij} = \begin{cases} 0, & \text{if}\ a_i(b_j) < a_{mat} \\ 1/N_{b_j}^F, & \text{otherwise} \end{cases} \end{equation*}\]
In words: the probability of a mother-offspring pair is zero if the potential mother was reproductively immature at the time of \(j\)’s birth. If the potential mother was reproductively mature, it is simply 1 over the number of reproductively mature females.
\[\begin{equation*} p_{ij} = \begin{cases} 0, & \text{if}\ a_i(b_j) < a_{mat} \\ 1/N_{b_j}^F, & \text{otherwise} \end{cases} \end{equation*}\]
In words: the probability of a mother-offspring pair is zero if the potential mother was reproductively immature at the time of \(j\)’s birth. If the potential mother was reproductively mature, it is simply 1 over the number of reproductively mature females.
Accurate genotyping (no false positives!)
Population and sampling model is accurate
Kinship comparisons are “independent” (or close enough…)
No heterogeneity in kinship probabilities that can’t be explained by observed (or
inferred) covariates
Age
Spatial location
Status (Mating hierarchy)
We need enough genetic markers to tell apart various kin groups. For parent-offspring pairs we might only need 200 SNPs or so, but for half-siblings it is nice to have 3-4K (after pruning ill-behaved loci).
\(\color{red}{\rightarrow \text{High quality tissue samples}}\)
For species where reproductive maturity is not instantaneous, we need to model pre-adult population dynamics, so we need some idea of early survival and reproductive schedules (decent early life history information!). We also need to get the underlying Leslie matrix right (pre vs. postbreeding census, etc.)
Accurate sampling models have more to do with independent fates. E.g. we won’t want to model mothers and offspring harvested in the same year.
The quality of the pseudo-likelihood as an approximation decreases as the amount of relatedness in a population increases. The usual effect when this happens in statistics is that precision (e.g., confidence intervals) is overstated.
CKMR has been conducted on populations as low as \(\approx 600\) but we don’t want to go super low.
No heterogeneity in kinship probabilities that can’t be explained by observed (or
inferred) covariates
If covariates are available, and important, they should be modeled! In some cases, e.g., heterogeneity in reproductive success due to dominance, we might need to leave out father-offspring comparisons or model them differently somehow (see bearded seal example).
One strategy is to omit certain categories of comparison (e.g., only making cross-cohort comparisons)
Populations that are “not too big and not too small” (e.g. several hundred to ten million or so) Need \(\approx 50\) kin pairs to produce reasonable estimates, required # of samples increases with \(\sqrt{N}\)
Decent genetic variation (severe inbreeding may make it difficult to discriminate different kin pair types)
Good “mixing” (either through movement or through sampling)
Group living species
One mother and one father! No weird breeding systems (e.g., armadillos)
Ages are helpful!
Some will require case-specific developments (philopatry, spatial structure, pair bonding)
Skill level probably depends on what type of data (e.g., POP-only, POP+HSP, single cohort vs. multiple cohort)
Relatively low cost, especially after markers and aging methods are developed (epigenetics?)
You’re going to want to have a biologist, biometrician, and a geneticist involved. Very few people have all skills and it’s a lot to ask of a single person (especially a grad student!!!)
Many models will need to be population- and data-dependent and will require bespoken code. That said, there are examples and templates out there that will help.
CKMR “looks backwards” - inference is made based on ERRO at the time of offspring’s births
Precision tends to be best “back in time” - precision in present day not usually as good (especially for long-lived species; see beluga example here)
Implications for monitoring/management
There are sometimes ways to help improve precision in the present by designing a CKMR experiment correctly!
Paper in prep (Taras, Conn, Quakenbush, Bravington, Baylis). Annual sampling of bearded seal subsistence harvests (tissue samples + teeth) by ADF&G
Our paper isn’t published yet so I can’t make data public yet. So I “adjusted” a few data points so it still tells the same general story but can’t be scooped because it’s not real data.
Our paper isn’t published yet so I can’t make data public yet. So I “adjusted” a few data points so it still tells the same general story but can’t be scooped because it’s not real data.